covered in Chapter 24). So the test statistic from this test should follow the chi-square distribution.
Now it is obvious why it is named the chi-square test! The next step is to obtain the p value for the test
statistic. To do that manually, you would look up the test statistic (which is 8.81 in our case) in a chi-
square table.
In actuality, the chi-square distribution refers to a family of distributions. Which chi-square
distribution you are using depends upon a number called the degrees of freedom, abbreviated d.f.
or df or by the Greek lowercase letter nu (v) (in this book we use df). The df is a measure of the
probability of independence between the value of the predictor (row) variable and value of the
column (outcome) variable.
How would you calculate the df for a chi-square test? The answer is it depends on the number of rows
in the cross-tab. For the
cross-tab (fourfold table) in this example, you added up the four values
in Figure 12-5, so you may think that you should look up the 8.81 chi-square value with 4 df. But you’d
be wrong. Note the italicized word independence in the preceding paragraph. And keep in mind that
the differences (
) in any row or column always add up to zero. The four terms making up the
8.81 total aren’t independent of each other. It turns out that the chi-square test statistic for a fourfold
table has only 1 df, not 4. In general, an N-by-M table, with N rows, M columns, and therefore
cells, has only
df because of the constraints on the row and column sums. In our case,
N — which is the number of rows — is 2, so N-1 is 1. Also, M — which is the number of columns —
is 2, so M-1 is 1 also (and 1 times 1 is 1). Don’t feel bad if this wrinkle caught you by surprise —
even Karl Pearson who invented the chi-square test got that part wrong!
So, if you were to manually look up the chi-square test statistic of 8.81 in a chi-square table, you
would have to look under the distribution for 1 df to find out the p value. Alternatively, if you got this
far and you wanted to use the statistical software R to look up the p value, you would use the following
code: pchisq(8.81, 1, lower.tail = FALSE). Either way, the p value for chi-square = 8.81, with 1 df, is
0.003. This means that there’s only a 0.003 probability that random fluctuations could produce the
effect seen, where CBD performs so differently than NSAIDs with respect to pain relief in chronic
arthritis patients. A 0.003 probability is the same as 1 chance in 333 (because
), meaning
very unlikely, but not impossible. So, if you set α = 0.05, because 0.003 < 0.05, your conclusion would
be that in the chronic arthritis patients in our sample, whether the participant took CBD or NSAIDs
was statistically significantly associated with whether or not they felt pain relief.
Putting it all together with some notation and formulas
The calculations of the Pearson chi-square test can be summarized concisely using the cell-
naming conventions shown in Figure 12-6, along with the standard summation notation described
in Chapter 2.